Rank | Count | Beginning |
---|---|---|
24570 | 1350 | W |
13800 | 601 | Nie |
12363 | 542 | Na |
8190 | 436 | Jeśli |
6 | 373 | A |
22878 | 366 | To |
17030 | 321 | Po |
27989 | 305 | Z |
6560 | 272 | I |
3227 | 269 | Czy |
7311 | 267 | Jak |
5088 | 220 | Dzięki |
8637 | 220 | Jest |
495 | 194 | Ale |
2657 | 184 | Co |
17 | 171 | Aby |
4157 | 167 | Do |
8988 | 136 | Jeżeli |
11903 | 128 | Możesz |
15410 | 127 | Od |
7245 | 124 | Ja |
3850 | 122 | Dla |
5971 | 117 | Gdy |
11791 | 113 | Moze |
9476 | 111 | Kiedy |
26726 | 105 | Wszystkie |
22111 | 104 | Tak |
12102 | 103 | Można |
1329 | 98 | Bardzo |
4004 | 93 | Dlatego |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV